General matrix multiplication for f32, f64 matrices. Operates on matrices
with general layout (they can use arbitrary row and column stride).
This crate uses the same macro/microkernel approach to matrix multiplication as
the [BLIS][bl] project.
We presently provide a few good microkernels, portable and for x86-64, and
only one operation: the general matrix-matrix multiplication (“gemm”).
[bl]: https://github.com/flame/blis
## Matrix Representation
**matrixmultiply** supports matrices with general stride, so a matrix
is passed using a pointer and four integers:
- `a: *const f32`, pointer to the first element in the matrix
- `m: usize`, number of rows
- `k: usize`, number of columns
- `rsa: isize`, row stride
- `csa: isize`, column stride
In this example, A is a m by k matrix. `a` is a pointer to the element at
index *0, 0*.
The *row stride* is the pointer offset (in number of elements) to the
element on the next row. It’s the distance from element *i, j* to *i + 1,
j*.
The *column stride* is the pointer offset (in number of elements) to the
element in the next column. It’s the distance from element *i, j* to *i,
j + 1*.
For example for a contiguous matrix, row major strides are *rsa=k,
csa=1* and column major strides are *rsa=1, csa=m*.
Strides can be negative or even zero, but for a mutable matrix elements
may not alias each other.
## Portability and Performance
- The default kernels are written in portable Rust and available
on all targets. These may depend on autovectorization to perform well.
- *x86* and *x86-64* features can be detected at runtime by default or
compile time (if enabled), and the crate following kernel variants are
implemented:
- `fma`
- `avx`
- `sse2`
## Features
This crate can be used without the standard library (`#![no_std]`) by
disabling the default `std` feature. To do so, use this in your
`Cargo.toml`:
```toml
matrixmultiply = { version = "0.2", default-features = false }
```
Runtime CPU feature detection is available only when `std` is enabled.
Without the `std` feature, the crate uses special CPU features only if they
are enabled at compile time. (To enable CPU features at compile time, pass
the relevant
[`target-cpu`](https://doc.rust-lang.org/rustc/codegen-options/index.html#target-cpu)
or
[`target-feature`](https://doc.rust-lang.org/rustc/codegen-options/index.html#target-feature)
option to `rustc`.)
## Other Notes
The functions in this crate are thread safe, as long as the destination
matrix is distinct.